Introduction

Welcome to my first tutorial! In this tutorial we will be using the classic iris data set in order to create a simple line-by-line beginners guide on how to create interactive data visualizations. We will be going through how to create:

1. 3D Scatter plots
2. 2D Scatter plots
3. Bar graphs
4. Histograms
5. Box plots
6. Density plots
7. Pie charts
8. Extra: Correlation matrices (non-interactive)

NOTE: In sections that have multiple tabs, the first tab will always include more comments and descriptions of the code. This is because the code in the rest of the tabs are similar to the first.

Load needed packages

First, we will load all packages that we will need.

Load the iris data set. This data set is part of the base data sets built-in in R, hence, we do not need to load it externally.

Taking a glance at the data

We will first check the top 6 rows of our data to get a feel of what it looks like.

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 145          6.7         3.3          5.7         2.5 virginica
## 146          6.7         3.0          5.2         2.3 virginica
## 147          6.3         2.5          5.0         1.9 virginica
## 148          6.5         3.0          5.2         2.0 virginica
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica


Interactive Data Visualizations

For most of our visualizations we will be using ggplot() from GGPLOT2 to create our graphs and ggplotly() from PLOTLY to make them interactive—unless otherwise noted.

3D scatter plots

We will begin by creating an interactive 3D scatter plot using plot_ly() from the PLOTLY package.

View different angles of the plot by clicking and draging with your mouse. You can also change which species the graph shows by clicking on the legend names.


2D scatter plots

Next, we will take a look at how to create interactive 2D scatter plots.

Sepal plot

Scatter plot of sepal width by length:

We have now created an interactive scatter plot! This will allow you to:
(1) Hoover your mouse over the data points for more information.
(2) Click on the legend names to add/remove variables from the plot.
(3) Click and drag with your mouse to zoom into a section of the plot.
(4) Hoover over the plot and configuration setting will appear on the top right of the plot which you can change.


Petal plot

Scatter plot of petal width by length:


Bar plots

Lets move on to create some interactive bar graphs. Each tab shows a bar graph for a different variable.

Sepal width

Bar graph of Sepal width:


Sepal length

(For a detailed explanation of how to create the bar graph, check the first tab.)

Bar graph of sepal length:


Petal width

(For a detailed explanation of how to create the bar graph, check the first tab.)

Bar graph of petal width:


Petal length

(For a detailed explanation of how to create the bar graph, check the first tab.)

Bar graph of petal length:


Histograms

Here we will create some interactive histograms. Each tab shows a histogram for a different variable.

Sepal width

Histogram of sepal width:


Sepal length

(For a detailed explanation of how to create the histogram, check the first tab.)

Histogram of sepal length:


Petal width

(For a detailed explanation of how to create the histogram, check the first tab.)

Histogram of petal width:


Petal length

(For a detailed explanation of how to create the histogram, check the first tab.)

Histogram of petal length:


Box plots

We will now move on to creating interactive box plots. Each tab shows a box plot for a different variable.

Sepal width

Box plot of sepal width:


Sepal length

(For a detailed explanation of how to create the box plot, check the first tab.)

Box plot of sepal length:


Petal width

(For a detailed explanation of how to create the box plot, check the first tab.)

Box plot of petal width:


Petal length

(For a detailed explanation of how to create the box plot, check the first tab.)

Box plot of petal length:


Density plots

Now we will attempt to create some interactive density plots. Each tab shows a density plot for a different variable.

Sepal width

Density plot of sepal width:


Sepal length

(For a detailed explanation of how to create the density plot, check the first tab.)

Density plot of sepal length:


Petal width

(For a detailed explanation of how to create the density plot, check the first tab.)

Density plot of petal width:


Petal length

(For a detailed explanation of how to create the density plot, check the first tab.)

Density plot of petal length:


Pie Charts

Now lets move on to how to create an interactive pie chart.

We will create a tibble with the number of each species in our data and then use this data to create our pie chart.


Correlation matrices & network plots

Lets now take a look at how to create correlation matrices and network plots in order to gauge how our variables correlate with one another. These are not interactive.

Each of these correlation visualizations are similar, yet different. I have included different types for educational purposes so that you can use the one you see fit.

Correlation with CORRPLOT

Here, we will use the corrplot.mixed() function from the CORRPLOT package to create an effective and easy to code correlation matrix.

This correlation matrix is very easy to create and is also useful because:
(1) It gives the correlations in a hierarchical clustering order which makes it easier to distinguish between more and less important correlations, and
(2) Creates ellipses which display the correlation directions and strengths based on the ellipses thickness and direction. This makes it easy to quickly visually gauge the correlations.


Correlation with PSCYH

Here, we will use the pairs.panels() function from the PSYCH package to create a detailed correlation matrix visualization of all our features.

This is another very useful correlation matrix because:
(1) It gives us the standard correlation values, which sizes depend on the correlation strength,
(2) Shows histograms of each of our variables, and
(3) Gives informative scatter plots to display correlations between the variables. These data points are disaggregated by Species through different colors. Such pretty. Much wow.


Correlation with CORRR

Use network_plot() from the CORRR package to create a correlation network.

This created network plot outputs highly correlated variables closer together and with more opaque colored lines between the variables. Variables that have a low correlation are separated further apart and with lighter colored lines.

Although this plot gives us less information than the previous two we have looked at, it is still good at demonstrating correlations between variables and could be especially useful for showing correlations to non-statistics savvy audiences.


Thank you very much for checking out my first tutorial! Please upvote if you found it helpful or a comment if you have any suggestions for improvements. :)